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Szekely and Rizzo present a new interesting measure of correlation. The 
idea of using f\(j> n (u,v) - 4>n\u)4>n\v)\ 2 dp(u,v), where 4> n , <f>£\ tjffl are 
the empirical characteristic functions of a sample (Aj,Yi), i = 1, ...,n, of 
independent copies of X and Y is not so novel. A. Feuerverger considered 
such measures in a series of papers [4]. Aiyou Chen and I have actually 
analyzed such a measure for estimation in [3] in connection with ICA. 

However, the choice of //(•, •) which makes the measure scale free, the ex- 
tension to X G R p , Y G M 9 and its identification with the Brownian distance 
covariance is new, surprising and interesting. 

There are three other measures available, for general p, q: 

1. The canonical correlation p between X and Y. 

2. The rank correlation r (for p = q = 1) and its canonical correlation gen- 
eralization. 

3. The Renyi correlation R. 

All vanish along with the Brownian distance (BD) correlation in the case 
of independence and all are scale free. The Brownian distance and Renyi 
covariance are the only ones which vanish iff X and Y are independent. 

However, the three classical measures also give a characterization of to- 
tal dependence. If \p\ = 1, X and Y must be linearly related; if \r\ = 1, Y 
must be a monotone function of X and if R = 1, then either there exist 
nontrivial functions / and g such that P(/(X) = g(Y)) = 1 or at least there 
is a sequence of such nontrivial functions f n , g n of variance 1 such that 
E(f n (X)-g n (Y)) 2 ^0. 

In this respect, by Theorem 4 of Szekely and Rizzo, for the common 
p = q = 1 case, BD correlation does not differ from Pearson correlation. 

Although we found the examples varied and interesting and the compu- 
tation of p values for the BD covariance effective, we are not convinced that 
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the comparison with the rank and Pearson correlations is quite fair, and 
think a comparison to R is illuminating. 

Intuitively, the closer the form of observed dependence is to that exhibited 
for the extremal value of the statistic, the more power one should expect. 
Example 1 has Y as a distinctly nonmonotone function of X plus noise, 
a situation where we would expect the rank correlation to be weak and, 
similarly, the other examples correspond to nonlinear relationships between 
X and Y in which we would expect the Pearson correlation to perform badly. 
In general, for goodness of fit, it is important to have statistics with power 
in directions which are plausible departures; see Bickel, Ritov and Stoker 

[!]• 

Ying Xu is studying, in the context of high dimensional data, a version 
of empirical Renyi correlation different from that of Breiman and Friedman 
[2]. 

Let fi , /2 , . . . be an orthonormal basis of L 2 (Px ) and g\ , g 2 > • • • an or - 
thonormal basis of L 2 (Py), where L2{Px) is the Hilbert space of function 
/ such that E/ 2 (X) < 00 and similarly for L 2 (Py). 

Let the (K, L) approximate Renyi correlation be defined as 

f / K L 

max< corrl ^ a k f k (X), ^ fogi{Y) 
I \fe=i 1=1 

where corr is Pearson correlation. 

This is seen to be the canonical correlation of f(X) and g(Y), where 
/ = . . . ,/ftr) T , g = (<?i, . . . ,5i) T j and is easily calculated as a generalized 
eigenvalue problem. The empirical (K, L) correlation is just the solution of 
the corresponding empirical problem where the variance covariance matrices 
Var f_{X) = nU X )!K X )\ where fc(X)=l(X)-El(X), Mar g(Y) and 
Cov(f(X),g(Y)) are replaced by their empirical counterparts. For K,L^ 
00, the (K,L) correlation tends to the Renyi correlation, 

R = max{corr(/(X), 5 (Y)) : / G L 2 (P x ),ge L 2 (P Y )}. 

For the empirical (K, L) correlation, K and L have to be chosen in a data 
determined way, although evidently each K, L pair provides a test statistic. 
An even more important choice is that of the ft and gi (which need not be 
orthonormal but need only have a linear span dense in their corresponding 
Hilbert spaces). 

We compare the performance of these test statistics in the first of the 
Szekely-Rizzo examples in the next section. 

1. Comparison on data example. Here we will investigate the perfor- 
mance of the standard ACE estimate of the Renyi correlation and a version 
of (K, L) correlation in the first of the Szekely-Rizzo examples. 
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Fig. 1. 








Table 1 








K = 2. L = 2 


K = 3, L = 4 


K = 5, L = 5 


Estimated (K,L) correlation 


0.8160803 


0.9170764 


0.977163 


p- value 


0.002 


0.002 


<0.001 



Breiman and Friedman [2] provided an algorithm, known as alternating 
conditional expectations (ACE), for estimating the transformations /o, go 
and R itself. 

The estimated Renyi correlation is very close to 1 (0.9992669) in this case, 
as expected since Y is a function of X plus some noise. Figure 1 shows the 
original relationship between X and Y on the left and the relationship be- 
tween the estimated transformations / and g on the right. Having computed 
R, the estimate of R, we compute its significance under the null hypothe- 
sis of independence using the permutation distribution just as Szekely and 
Rizzo did. The p- value is <0.001, which is extremely small as it should be. 

Next, we compute the empirical (K, L) correlation. Given that the pro- 
posed nonlinear model is 



Pi 

y = — exp 

P2 



A) 2 



we chose, as an orthonormal basis with respect to the Lebesgue measure, one 
defined by the Hermite polynomials defined as H n (x) = (— l) n e x ^^re" 1 / 2 , 



for both X and Y. We take fj~{ 



9k{-) 
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Table 1 gives the computation results of different combinations of K and 
L. As before, the p-value is computed by a permutation test, based on 999 
replicates. 

The value, not surprisingly, is close to R, for K = L = 5. 
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